38 research outputs found
Hungarian noun phrase extraction using rule-based and hybrid methods
We implement and revise Kornai's grammar of Hungarian NPs [11] to create a parser that identifies noun phrases in Hungarian text. After making several practical amendments to our morphological annotation system of choice, we proceed to formulate rules to account for some specific phenomena of the Hungarian language not covered by the original rule system. Although the performance of the final parser is still inferior to state-of-the-art machine learning methods, we use its output successfully to improve the performance of one such system
Book reviews
Daniel Currie Hall: The role and representation of contrast in phonological theory. University of Toronto, Toronto, 2007, 277 pp. ; David Odden: Introducing phonology.
Cambridge University Press, Cambridge, 2005, 348 pp
Building definition graphs using monolingual dictionaries of Hungarian
We adapt to Hungarian core functionalitites of the 4lang library [12], which builds 4lang -style semantic representations [7] from raw text using an external dependency parser as proxy, and processes definitions of monolingual dictionaries to build definition graphs for concepts not defined in the hand-written 4lang dictionary [8]. In Section 2 we provide a short overview of the 4lang formalism, Section 3 describes the architecture of the text_to_4lang and dict_to_4lang systems. We describe in detail the steps taken to adapt our system to Hungarian in Section 4. The new tool is evaluated in Section 5. The new components presented in this paper are part of the latest version of the 4lang library, which is available under an MIT license from http://www.github.com/kornai/4lang
Structure Learning in Weighted Languages
We present Minimum Description Length techniques for learning the structure of weighted languages. MDL is already widely used both for segmentation and classification tasks, and here we show it can be used to formalize further important tools in the descriptive linguists ’ toolbox, including the distinction between accidental and systematic gaps in the data, the detection of ambiguity, the selective discarding of data, and the merging of categories